Skip to content

feat(model): add quantization support for LLM2Vec text encoder#12

Open
Lee-Jun-Hyuk-37 wants to merge 1 commit intonv-tlabs:mainfrom
Lee-Jun-Hyuk-37:feat/llm2vec-quantization
Open

feat(model): add quantization support for LLM2Vec text encoder#12
Lee-Jun-Hyuk-37 wants to merge 1 commit intonv-tlabs:mainfrom
Lee-Jun-Hyuk-37:feat/llm2vec-quantization

Conversation

@Lee-Jun-Hyuk-37
Copy link
Copy Markdown

Thank you for this excellent project.

Summary

  • Add KIMODO_QUANTIZE env var to load the Llama-3-8B text encoder with reduced precision via bitsandbytes
  • Supported modes: 4bit (NF4, ~5GB VRAM), 8bit (INT8, ~9GB VRAM)
  • Quantized models are pinned to their device to avoid errors from .to() calls on quantized weights

Motivation

Kimodo currently requires ~17GB VRAM, which limits it to high-end GPUs (A100, RTX 3090/4090). Many consumer GPUs have 8-12GB VRAM, which is enough for the diffusion model (~1GB) but not for the full-precision text encoder (~16GB).

This change lets users trade a small amount of text embedding quality for significantly lower VRAM usage, making Kimodo accessible on a much wider range of hardware.

Usage

KIMODO_QUANTIZE=4bit kimodo_gen "A person walks forward." --output motion
KIMODO_QUANTIZE=8bit kimodo_gen "A person walks forward." --output motion

Requires: pip install bitsandbytes accelerate

Add KIMODO_QUANTIZE env var to load the Llama-3-8B text encoder
with reduced precision via bitsandbytes:

  KIMODO_QUANTIZE=4bit  - NF4 4-bit (~5GB VRAM, down from ~17GB)
  KIMODO_QUANTIZE=8bit  - INT8 8-bit (~9GB VRAM)

This makes Kimodo usable on consumer GPUs (8-12GB) while retaining
full text-prompt support. The quantized model is pinned to its device
to avoid errors from .to() calls on quantized weights.

Requires: pip install bitsandbytes accelerate
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant